Unassigned complex unique recombinant forms related to CRF36_cpx in children identified in an HIV-1 outbreak in Pakistan

In 2019, an outbreak of HIV infection predominantly affecting children occurred in Larkana district, Pakistan. This is the largest outbreak ever reported in this age group in Pakistan. In this study, we report two HIV-1 unique recombinant forms identified within the outbreak. Blood samples were collected from HIV-positive children as part of a case-control study to investigate the outbreak. Pol gene was sequenced and used to detect HIV subtype/recombinant forms using subtype, recombination, and phylogenetic analyses. Drug resistance mutation (DRM) analysis was performed to characterize the drug DRMs in each sequence. We observed the emergence of two unassigned unique recombinant forms related to CRF36_cpx in 15 individuals out of 344 samples. Genotype analysis revealed the presence of multiple DRMs associated with resistance to reverse transcriptase inhibitors. The discovery of these unassigned URFs in our population highlights the need for comprehensive molecular epidemiological studies to fully understand the distribution and drug resistance patterns to aid control efforts.


Introduction
HIV-1 is a remarkably diverse retrovirus, with the Group M of HIV-1 classified into nine subtypes A-D, F-H, J, and K. 1 HIV-1 is geographically diverse with reference to its subtypes. 1 For instance, in America, Australia, and Europe subtype B is predominant; across East Africa subtype A infection is dominant; in West Africa, almost all other known subtypes are known to be circulating and CRF02_AG and subtype G are the predominant subtypes. Similarly, Subtype C is the predominant subtype in Southern Africa, Ethiopia, and India. 2 Subtype A1 is predominant in Pakistan followed by subtype B, CRF02_AG, and A1G. 3 HIV infection with multiple strains can occur in an individual with repeated exposure e.g. due to high-risk sexual behavior and can result in the emergence of new or complex recombinant forms. 4 Over 100 circulatory and unique recombinant forms (CRFs and URFs) of HIV-1 have been identified. Approximately 20% of HIV-1 infections worldwide are caused by URFs, while in some African countries this proportion is as high as 40%. 5 In Pakistan, HIV-1 currently exists as a concentrated epidemic in certain high-risk groups, such as persons who inject drugs (PWID) and men who have sex with men (MSM), and transgender sex workers (Hijra sex workers), with frequent spillovers into low-risk populations in form of isolated outbreaks. [6][7][8][9] Molecular epidemiological studies from Pakistan have shown the presence of diverse subtypes and URFs in the country, namely B, C, D, G, A1D, A1G, 01G, CG, 01_AE, CRF02A1, CRF02_AG, and CRF35_AD, while subtype A1 is the predominant subtype. 3,10 In addition to these known subtypes, several new and unique CRFs and URFs have been reported: URF_DG 11 , CRF56_cpx, and CRF02_A1. 8,12 In April 2019, an outbreak of HIV-1 infection predominantly affecting children was reported in Larkana district, Sindh province, Pakistan. This is the largest outbreak ever reported in this age group in Pakistan, where more than 1000 children had tested positive for HIV-1. 13 As part of the investigation of the Larkana outbreak, epidemiological and molecular studies were carried out. 14 In this study, we report two unassigned URFs related to CRF36_cpx from individuals identified with HIV infection in the outbreak that possibly emerged from recombination between CRF02_AG and CRF36_cpx and CRF02_AG, CRF36_cpx and subtype H, respectively.

Methods
As part of a case-control study to investigate the outbreak, blood samples were collected from children aged below 15 years diagnosed with HIV-1 infection (cases) in Larkana, and uninfected age and sex-matched community controls. 15 The samples were collected after obtaining informed consent from the participants and/or from the guardians of the participant. 16 A questionnaire was used to obtain demographics and relevant risk factor information from the study participants. The study was approved by the Aga Khan University Ethical Review Committee (ERC# 2019-1536-4200), and all experiments were performed in accordance with relevant guidelines and regulations. In this study, we focus on two unassigned complex URFs. The findings of the complete molecular epidemiological analysis will be reported separately.
The obtained pol sequences were subtyped using the HIV-1 REGA subtyping tool, while the jpHMM tool was used to detect recombinations and recombination breakpoints within the pol gene of the HIV-1 genome. 12,13,15,17 The strain identity was investigated using phylogenetic analysis. For this, a multiple sequence alignment (MSA), containing outbreak sequences (ID: AKULO) as well as HIV-1 subtype reference obtained from the Los Alamos HIV sequence Database, was generated using the MAFFT program. 18,19 For 1 sequence (AKULO_387), for which two recombination breakpoints were predicted by the jpHMM, two separate MSAs were generated. The first and second MSA of the AKULO_387 sequence comprised split nucleotides spanning positions 2236-3192 and 3193-3313 (with reference to the HXB2 genome) respectively. These two MSAs were Abidi

Results
Of the 344 samples from cases, unassigned URFs were observed in 15. The median age of the 15 participants was 2.8 years (range: 0.8-9 years) and 33% were female. At the time of sample collection, two participants were ART-naïve, and the remainder had only recently started ART (Table 1). Subtyping analysis identified the 15 sequences as CRF02_AG with undefined recombination related to CRF36_cpx. On analysis, 14 sequences clustered with CRF36_cpx sequences submitted from Cameroon (accession numbers: GU366128, EF087994, and EF087995) and CRF02_AG sequences submitted from Liberia and Nigeria (accession numbers: AB485636, L39106, respectively) ( Figure 1A). These 14 sequences exhibited a strong node value of 0.87 ( Figure 1A), indicating these sequences to be an unassigned recombinant form related to CRF36_cpx, possibly emerging from recombination between CRF02_AG and CRF36_cpx. The Cameroonian URF (accession number: GU366128) most closely clustering with the 14 sequences generated in our study ( Figure 1A, Black arrow) was previously reported from a 2004 cohort study where the authors identified this strain as CRF02_AG containing 36cpx recombination at the 5'end of the pol region. 20 Interestingly, when we included this sequence in our phylogenetic tree, it became part of the larger cluster (node support value 0.87) that contained outbreak sequences as well as CRF02_AG and CRF36_cpx sequences ( Figure 1A), further supporting that the 14 outbreak sequences to be unassigned URF related to CRF36_cpx.
For the remaining one sequence (ID: AKULO_387) the phylogenetic analysis was performed using the two MSAs developed on the breakpoints predicted by jpHMM. On the phylogenetic tree, based on the first MSA, this sequence clustered with CRF02_AG submitted from Cameroon, Nigeria, and Liberia (accession number: AY271690, L39106, AB485636, respectively) and CRF36_cpx submitted from Cameroon (accession number: EF087994-EF087995) with node support value of 0.99 ( Figure 1B). Similarly, on the phylogenetic tree, based on the second MSA the sequence clustered with subtype H, submitted from Belgium (accession number: AF190127) with node support value of 0.90 ( Figure 1C), indicating the strain to be a subtype H-like unassigned complex URF related to CRF36_cpx.
The branch length of the sequence in one of the 14 samples (ID: AKULO_248) in the cluster ( Figure 1A) suggested the presence of additional recombination(s) that was not predicted by any tool. The phylogenetic analysis of this sequence showed clustering with CRF18_cpx in addition to CRF02_AG and CRF36_cpx (Figure 2), suggesting additional recombination; however, this warrants further investigation. On genotypic analysis, six of the fifteen (40%) sequences contained mutations associated with resistance to antiretroviral drugs (Table 1): the T215N, M184V, and Y115F mutations are associated with resistance to nucleoside reverse transcriptase inhibitors (NRTIs); the K103N, Y181C, A98G, V179L mutations are associated with resistance to non-nucleoside reverse transcriptase inhibitors (NNRTIs). No protease inhibitor resistance mutations were observed. 19 A similar analysis of the reference sequences that clustered with our outbreak sequences ( Figure 1) showed that none of the reference sequences harbored any DRM, except subtype H sequence (accession no: AF190127, Figure 1C) that had DRMs D67N, K70R, K219Q associated with resistance to all NRTIs.

Discussion
The current study reports the detection of novel unassigned URFs related to CRF36_cpx in children identified with HIV-1 during the Larkana outbreak. CRF 02_AG is one of the main circulating strains in Pakistan; however, the current analysis suggests that it is recombining with other subtypes/CRFs, leading to the emergence of subtype H-like or CRF36_cpx related URFs. CRF36_cpx is limited to Cameroon and has been found to recombine with CRF02_AG. 20 Subtype H has only been reported in Central Africa and the United Kingdom. 19,21,22 Recombination of subtype H with multiple other subtypes and CRFs has been observed including recombination with CRF04_cpx, CRF27_cpx, and complex recombination with U/CRF02_AG. 19,23 One limitation of this study is the use of pol sequence only for the assignment of the recombinant form. Analysis based on longer regions of the HIV-1 genome or whole genome, which could not have been done due to funding constraints, would have been useful to accurately determine the recombinant form and the region(s) of recombination. Nonetheless, our analysis supports the presence of subtype-H like or CRF36_cpx related URFs, which suggest that the HIV epidemic in Pakistan is much more dynamic and complex than previously thought. Some of these strains may become more adaptive and emerge as major forms in the future.
Genotypic analysis of sequences suggested the presence of drug resistance mutations that cause resistance against multiple reverse transcriptase inhibitors. Three out of 13 ARVexperienced individuals harbored mutations associated with high-level resistance to NRTI and/or NNRTIs DRMs (M184V, K103N, Y115F, and Y181C). Two of the 13 individuals also had mutations T215N and Y179L associated with resistance against zidovudine, and abacavir and tenofovir, respectively, which are the first-line regimen in Pakistan. The mutation T215N is a revertant mutation; the presence of this mutation suggests that the individual was previously infected with or had acquired HIV-1 whose majority population had T215Y/F (a highly resistant NRTI mutation). 9 Similarly, the mutation A98G (observed in 2 individuals) is associated with resistance to multiple NNRTIs. No drug resistance mutations were observed in ARV-naïve individuals. While all but two individuals were ARTexperienced, the remainder had started ART recently (mean ARV duration was 1 month; range 8 days to 3 months). Therefore, it is unlikely that these mutations would have emerged as a result of sub-optimal adherence and may be due to transmitted drug resistance. 24,25 This may have possible implications for the treatment and control of HIV in Pakistan.
The presence of multiple drug resistance mutations in these strains, especially to first-line ART drugs, is alarming as it limits treatment options. Large-scale transmission of resistant strains can hamper efforts to control the spread of the HIV epidemic in Pakistan, where second-line drugs are not easily available. The discovery of multiple URFs/ CRFs in this outbreak highlights the need for comprehensive molecular epidemiological studies and molecular surveillance to understand the distribution of different genotypes as well as origin, transmission, and drug resistance patterns. This will inform the appropriate treatment of individuals with HIV and strategies for preventing further outbreaks and controlling the spread of the HIV epidemic in the country.